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Abstract 

Suppose that a rooted tree T is given for preprocessing. The level- ancestor 
problem is to answer quickly queries of the following form. Given a vertex v and an 
integer i > 0, find the ith vertex on the path from the root to v. Algorithms that 
achieve a linear time bound for preprocessing and a constant time bound for a query 
have been published by Dietz (1991), Alstrup and Holm (2000), and Bender and 
Farach (2002). The first two algorithms address dynamic versions of the problem; 
the last addresses the static version only and is the simplest so far. The purpose 
of this note is to expose another simple algorithm, derived from a complicated 
PRAM algorithm by Berkman and Vishkin (1990,1994). We further show some easy 
extensions of its functionality, adding queries for descendants and level successors 
as well as ancestors, extensions for which the formerly known algorithms are less 
suitable. 
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1 Introduction 

The lev el- ancestor problem is defined as follows. Suppose that a rooted tree T is given 
for preprocessing. Answer quickly queries of the following form. Given a vertex v and 
an integer i, find an ancestor of u in T whose level is i, where the level of the root is 0. 

Two related tree queries are: Level Successor — given v, find the next vertex (in 
preorder) on the same level. Level Descendant — given v and i, find the first descendant 
of V on level i (if one exists) . 

The level-ancestor problem is a relative of the better-known LCA (Least Common 
Ancestor) problem. In their seminal paper on LCA problems [12], Harel and Tarjan 
solve the level ancestor problem on certain special trees as a subroutine of an LCA 
algorithm. An application of the Level Ancestor problem is mentioned already in [T], 
although an implementation of this data structure had not yet been published at the 
time. 

The first published algorithms for the level ancestor problem were a PRAM algo- 
rithm by Berkman and Vishkin [6,17], and a serial (RAM) algorithm by Dietz [8] that 
accommodates dynamic updates. Alstrup and Holm [3] gave an algorithm that solves 
an extended dynamic problem, and has the additional advantage that its static-only 



version is simpler than the previous algorithms. Finally, the simplest algorithm — for 
the static problem only — was given by Bender and Farach [5]. 

It is curious that very complicated algorithms to address theoretical challenges, 
namely dynamization and parallelization, had been published for this problem earlier 
than any simple algorithm for the most basic and useful variant (static, on serial RAM). 
It is also curious that the essential ideas for such an algorithm do appear in Berkman and 
Vishkin's solution but this potential contribution was missed, since they concentrated on 
the PRAM problem, for which they gave a notoriously impractical algorithm (involving 
a tableof almost 2^ entries). The first goal of this paper is to rectify this situation 
by presenting a sequential algorithm based on the approach of Berkman and Vishkin. 
This is not done just for historical interest, but because the algorithm here presented 
is simply useful: it is efficient and easy to implement (and has been implemented). 
Furthermore, we shall present a few useful extensions that were either unsupported by 
previous work, or supported in much more complicated ways. Specifically, we show 
how to accommodate level successor and level descendant queries, in addition to level 
ancestor. Together, these two queries are useful for iterating over the descendants of a 
vertex at a given level. For example applications of the extension, see [HI [15]. 

Technical remarks. Since we only consider data structures that support 0(l)-time 
queries, we refer to the algorithms by the preprocessing cost. That is, an 0(n)-time 
algorithm means linear-time preprocessing. The data to the algorithm is a tree T 
whose precise representation is of little consequence (since standard representations are 
interchangeable within linear time). We assume that vertices are identified by numbers 
through n — 1. 

2 The Euler Tour and the Find-Smaller problem 

Like the better-known LCA algorithm that also originates from [6j, this Level Ancestor 
algorithm is based on the following key ideas: 

• The Euler Tour representation of a tree reduces the problem to a problem on a 
linear array. 

• A data structure with 0(nlogn) preprocessing time (and size) is given for this 
problem. 

• This solution is improved to linear-time preprocessing and size using the microset 
technique [TO', 12]. 

The microset technique is also used in other work on level ancestors [U EJ [13l [H] but 
they all apply at least part of the processing to the tree, using various methods of 
decomposition into subtrees. Here, all processing is applied to the Euler-tour array. 

Consider a tree T = {V,E), rooted at some vertex r. For each edge {v — > u) in 
T, add its anti-parallel edge {u v). This results in a directed graph H. Since the 
in-degree and out-degree of each vertex of H are the same, H has an Euler tour that 
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starts and ends in the root r of T. Note that the tour consists of 2{n — 1) arcs, hence 
2n — 1 vertices including the endpoints. 

By a straight-forward apphcation of DFS on T we can compute the fohowing infor- 
mation: 

1. An array i?[0..2n — 2] such that E[i] is the ith vertex on the Euler tour. 

2. An array L[0..2n — 2] such that L[i\ is the level of the ith vertex on the Euler tour. 

3. An array R[Q..n — 1] such that R[v\ is the index of the last occurrence of v in the 
array E, called the representative of v. 



Observation 1 Let I < level(i'). Vertex u is the level-l ancestor of vertex v if and 
only if u is the first vertex after the last occurrence of v in the Euler tour such that 
level(ii) < /. 

By this observation, the computation of the arrays E, L and R reduces the level- 
ancestor problem to the following 

FIND-SMALLER (FS) Problem. 

Input for preprocessing: Array A= (oi, 02, . . . , fln) of integers 

Query: Let < i < n and x E Z. A query FSA(^,a^) seeks the minimal j > i such that 
ttj < X. If no such j exists, the answer is 0. 

Our goal is to preprocess the array A so that each query FSyi(i,x) can be processed in 
0(1) time. 

The Euler tour implies that the difference between successive elements of array L is 
exactly one. Therefore, for our goal, it suffices to solve the following restricted problem: 

(ibl)FS is the Find-Smaller problem restricted to arrays A where for all i, |oj — aj+i| = 1. 

We remark that the general Find Smaller problem cannot be solved with 0(1) query 
time, if one requires a polynomial-space data structure, and assumes a polylogarithmic 
word length; the reason is that the static predecessor problem, for which non-constant 
lower bounds are known [1], can be easily reduced to it. 

Another preparatory definition is the following. Let n be a power of two and consider 
a balanced binary tree of n — 1 nodes numbered 1 through n — 1 in symmetric order 
(thus, 1 is the leftmost leaf and n — 1 the rightmost). The height of node i is rnz(z), 
the position of the rightmost non-zero bit in the binary representation of i, counting 
from 0. We denote by LCAsTihj) the least common ancestor of nodes i and j. For 
the algorithms, we assume that LCAsTih j) is computed in constant time. In fact, it 
can be computed using standard machine instructions and the MSB (most significant 
set bit) function; this function is implemented as an instruction in many processors, but 
could also be provided by a precomputed table. Following is a useful property of the 
rnz() function. 
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Lemma 2 If j < i are two nodes of the complete binary tree, and k = LCA^tQ', i), 
then i- j + l< 2i+™<'^), andi-k + l< 2^'^<^\ 

We omit the easy proof. Finally, for uniformity of notation, we define LCABr(j, z) for 
j < < i to be 0. 

3 Basic constant-time-query algorithm 

In this section we describe an 0(n log n)-time preprocessing algorithm for the (=bl)FS 
problem. Throughout this section and the sequel, we make the simplifying assumption 
that n is a power of two. 

Our description of the Basic algorithm has two steps. (1) The output of the prepro- 
cessing algorithm is specified, and it is shown how to process an FS query in constant 
time using this output. (2) The preprocessing algorithm is described. This order helps 
motivating the presentation. 

3.1 Data structure and query processing 

For each i, < i < n, the preprocessing algorithm constructs an array Bi[l..f{i)], where 
/(O) = n and for i > 0, f{i) = 3 • 2''°^^*). In Bi[j\ we store the answer to FS(z, — j). 

A query FS(i,x) is processed as follows (we assume that x > — n, for otherwise 
the answer is immediate, due to the ±1 restriction). 

(1) If X > Oj, return i. 

(2) Let d = ai — X. If d < f{i) return Bi[d]. 

(3) Otherwise, let k = IjC A.BT{i — d + l,i); return Bk[ak — x]. 

Figure [T] demonstrates the structure for a 16-element array A, except that all the 
arrays Bi are truncated to 8 elements. In this example, the query FS(6, 3) is answered 
immediately as Bq[1] = 11; the query FS(9, 1) is answered via Case (3): A; = 8 and 
Bkiak - 1] = B8[5] = 13. 

We now explain the algorithm. Correctness of Case (2) is obvious by the definition 
of the structure. The correctness in Case (3) hinges on two claims. The first. Claim [3] 
below, shows that the reference to Bk[ak — x] is within bounds; the second. Claim [U 
shows that the answer found there is the right one. 

Claim 3 In Case (3), we have < Ofc — x < f{k). 

Proof. For the first inequality: /c > i — d by its definition; we are dealing with 
(±1)FS, therefore > Ui — d = x. For the second inequality: We assume A; > 0, as for 
k = and the claim clearly holds. Consider the complete binary tree of n — 1 nodes, 
used to define LCAst(^,j)- The algorithm sets k = LCAsTii — d+lji), so by LemmaO 

2l+rnz(fe) ^ i_(i_rf+l)=rf_l = ai-X-l 
2rnz(fc) > 

=> 3-2™^('=) > a,-x + i-k. 
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Figure 1: The basic FS structure. 

Since the difference between consecutive elements is ±1, we have < Ui + i — k, so we 
conclude that 

3 • 2'^°^('=) >ak-x. 

□ 

Claim 4 If i — ai + x < k < i, then FS{k, x) = FS{i,x). 

Proof. Because we are dealing with (ibl)FS, the values ak,---,ai are all in the 
interval (oj — (i — /c), + (z — A;)). By assumption we have ai — {i — k) > x. Thus, the 
answer to FS(fc,x) lies beyond Oj, and is also the answer to FS(i,2;). □ 

3.2 The preprocessing algorithm 

It is easy to verify that the size of the data structure is G(nlogn). To construct it in 
0{n log n) time, we perform a sweep from right to left; that is, for z = n — 1, n — 2, . . . , 
we compute an array F[min A, . . . , max A\ where F[x\ is the index of the first j > i such 
that aj = x (or by default). Note that this is not the same as FS(f,x). Initializing F 
for i = n — 1 is trivial and that updating it when i is decremented is constant-time. For 
each i, Bi is just a copy of an appropriate section of F. This completes the preprocessing. 

4 Improved constant-time-query algorithm 

In this section we describe an 0(n)-time algorithm, based on the solution of the former 
section together with the microset technique. The essence of the technique is to fix a 
block length b = [(logn)/2j and to sparsify the structure of the last section by using 

it only on block boundaries, reducing its cost to 0{n), while for intra-block queries 
we use an additional data structure, the micro-structure. For presentation's sake, we 
now provide a specification of the micro-structure and go on to describe the rest of the 
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structure. The implementation of the micro-structure wih be dealt with in the following 
section. 

For working with blocks, without resorting to numerous division operators, we shall 
write down some numbers (specifically, array indices) in a quotient-and-remainder no- 
tation, ib + j, where it is tacitly assumed that < j < b. 

The Micro-Structure. This data structure is assumed to support in 0(1) time the 
following query: Micro. FS {ib + j,x) — return the answer to FS{ib + j, x) provided that 
it is less than {i + l)b. Otherwise, return 0. 

The FS Structure. For each i, < i < n/b, our preprocessing algorithm now con- 
structs two arrays: 

1. A near array Ni[l, . . . ,2b] such that Ni[j] stores the answer to FS(i6, ajj, — j) 
(namely, the first 26 entries of Bn, of the previous section). 

2. A far array . . . , f{i)] such that 

F,[j] = [FS{ib,aib-jb)/b\ 

Thus, the arrays are not only sparsified, but also (for the far arrays) are their val- 
ues truncated. Referring to the example in Figure [H we have 6 = 2, so near arrays 
have 4 elements, e.g., A'4 = (9,10,11,12). The far array F4 has /(4) = 12 entries: 
(5,6,0,... ,0). 

The following fact follows from the (±1) restriction and the definition of Ff. 
Observation 5 If Fi[j] = k, then ajj, — jb < a^b < CLib — {j — l)b. 

Query processing. A query FS{ib + j,x) is processed as follows (we assume once 
again that x > ao — n). 

(1) If X > aib+j, return ib + j. 

(2) If i = then 

(2.1) If X > a-ib — 26 return Ni[aib — x] 

(2.2) l{aib-x)/b\- 
if d < f{i) then 

(2.2.1) k ^ Fi[d]; return FS{kb,x). 
else 

(2.2.2) l^i-d + l;k^ LCAbt{1, i); return FS(/c6, x). 

(3) (ifi>0) 

m Micro. FS(i6 -|- j, x); 

if m 7^ 0, return m, else return FS((i -|- 1)6, x). 
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The following observations justify this procedure, and also show that there is no real 
recursion here: the recursive calls can actually be implemented as gotos and they never 
loop. 

(1) In Case ([3]), when the micro-structure does not yield the answer, it follows that 
the element sought is further than (i + 1)5; therefore the recursive call is correct, 
and will be handled at Case ([2]). 

(2) In Case (2.2.1), we have (see Observation [5] above) 

o-kb < dih — (d — 1)6 < X + 26 

and 

Ckb >aib-db>x 
Therefore, the recursive call is handled correctly at Case (j2.ip . 

(3) For Case (2.2.2), we can show, as for the basic algorithm, that FS(A;6, x) = 
FS(i6, x) (same proof as before), and that < akb — x < f{k) -b, showing that the 
recursive call falls back to Case (2.2.1). The last inequality is proved as Claim [HI 

Claim 6 In Case (2.2.2), we have a^^ — x < f{k) ■ 6. 

Proof. We assume A; > 0. By Lemma [2l 

2l+rnz(fe) > + l = J> 

6 

2mz(fc) > i-k + 1 

f(k) = 3 • 2™^('') > "''^ ~ ^ + i-k= '^ib-x + ib- kb ^ akb - x 

6 b ~ b ' 

where the last inequality is justified by the (±1) property. □ 

5 The Micro Structure 

The purpose of the micro structure is to support "close" queries, i.e., return the answer 
to FS(i6+j, x) provided that it is at most (i + 1)6. There are several ways to implement 
this structure, with subtle differences in performance or ease of implementation. We 
describe two. 

5.1 Berkman and Vishkin's structure 

The basis for fast solution of in-block queries in [Tj is observing that, up to normalization, 
there are less than 2'' different possible blocks. Normalization amounts to subtracting the 
first element of the block from all elements; i.e., moving the "origin" to zero. Clearly, 
a query on any array A, FSAij,x), is equivalent to FSa'Ui^ ~ '^o) where A' is the 
normalized form of A. The bound 2* follows from the (±1) restriction. This also allows 
us to conveniently represent a block as a binary string of length 6 — 1 (which fits in a 
word). We obtain the following solution. 
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Preprocessing: For every possible "small" array S of size b, beginning with 0, and 
satisfying the (±1) restriction, build a matrix Ms[b x 2b] such that Ms[j,x] is the 
answer to FS5(j, x) for every < j < b and —b < x < b. As an identifier of S (to index 
the array of matrices) we use the (6 — l)-bit representation of S. While preprocessing 
an array A of size n for FS queries, we store for every < i < n/b the identifier S[i] of 
the block (aj^, . . . , a(j+i)fe„i). 

Query: Micro. FSyi(«& + j,x) is answered by looking up Mg^j[j,x — Uib] (returning if 
the second index is out of range). 

Complexity: The query is obviously constant-time. For the preprocessing, creating the 
idntifier array S clearly takes 0(n) time. The construction of a single matrix Ms can 
be done quite simply in 0(6^) time, and altogether we get 2^ • 0(6'^) = 0(n) time and 



5.2 A solution after Alstrup, Gavoille, Kaplan and Rauhe 

Another implementation of the micro structure is suggested by an idea from [2]. In its 
basic form, as we next describe, it is really independent of the division into blocks — 
except that it only supports queries where the answer is close enough to the query 
index. 

For i < j < n, let 



From the (±1) property, one can easily deduce that FSyi(z, ai — k) is precisely the position 
of the A;th 1 in the sequence 



The solution to the micro-structure problem, based on this observation, follows: 
Preprocessing: For every < i < n, compute and store in an array entry M[i] the 6-bit 
mask (m(i, i + 1), . . . , m{i, i + b)). 

Query: Micro.FSA(«, a;) is answered (for x < ai) by looking up the (oj — x)'th set bit in 
M[i]. The answer is if there is no such bit. 

This query returns answers in positions up to « -|- 6, rather than \i/b~\ ■ b, which can 
possibly result in a faster query. As an additional advantage, b can be enlarged up to 
the word size, saving both time and space (there is a certain caveat — see below). 
Query Complexity: The query is constant-time if we have a constant-time implementa- 
tion of the function that locates the i'th bit set in a word. In the absence of hardware 
support, a precomputed table, of size 0{2^ ■ b), can be used (but this requires limiting 
the value of b as before)Q. 

Preprocessing Algorithm: To compute the mask array M, we scan A from right to left 
while maintaining two pieces of data: the mask corresponding to the current position 

^Another way, which is not constant-time in the RAM model, is to search for this bit using available 
arithmetic/logical instructions. Since this is a tight loop without any memory access, it may be even 
faster than a table access on a real computer. 



space. 




1 if Oj < minjoj, . . . 
otherwise. 




m{i, i + 1), m(z, i + 2), . . . 
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i, and a stack that includes the indices {j \ m{i,j) = 1} up to the end of A. Each time 
the current position i is decremented, i is pushed unto the stack, possibly kicking off 
the top two elements (specifically, if Oj+i = aj + 1). The current mask is easily adjusted 
in 0(1) time. 

Clearly, the computation of M takes G(n) time, and this is also the space required. 
Fischer and Heun [9] propose to apply this technique within microblocks; in other words, 
revert to the Berkman-Vishkin approach of maintaining a table indexed by the block 
identifier, but keep the mask table instead of an explicit answer matrix. This saves a 
factor of b in the size of the micro structure, but is likely to be competitive in speed 
only if the bit-finding operation we make use of is supported by hardware. 

5.3 Saving memory 

In our description of the algorithm we aimed for simplicity while achieving the desired 
asymptotic bounds: constant-time query together with 0{n) space and preprocessing 
time. If, for some practical reason, the constant in the 0(n) space bound is of impor- 
tance, one can look for improvements, which are not hard to find. We list two simple 
constant-factor improvements. 

(1) The size f{i) of Bi can be defined to be 2 • 2™^(*) + 1 instead of 3 • 2™^') . Moreover, 
assuming that all Oj > (as is the case when using FS to solve Level Ancestors), 
we can use min(/(i), Cj). This eliminates Bq, and may give additional savings 
further on, depending on the shape of the tree in the Level Ancestor problem. 

(2) The size of the arrays E, L in the reduction of Level Ancestors to the Find-Smaller 
problem can be cut in half by listing a vertex v m E only when visited by the Euler 
Tour for the last time (put otherwise, we list the vertices in post-order). It is still 
true that the level-/ ancestor of v is the first vertex u occurring after v such that 
level (n) < I. Thus, the reduction to Find Smaller is still correct. However, now 
the FS problem that results does not enjoy the (±1) property. But it has a similar 
property: for all i, Oj+i > + 1. Interestingly, this suffices for implementing the 
algorithm, at least with the micro-structure of Section 15.21 Thus, this saving in 
memory incurs no loss in running time. 

Remark. This part of the solution is where the simplification with respect to [7] 
is most significant, although the outline (initial, non-optimal, solution, and usage of 
micro-blocks) is similar. 

6 The Level-Descendant and Level- Successor Queries 

Observation [T] can easily be turned from ancestors to descendants: 

Observation 7 Let I > level(f). Vertex u is the first level-l descendant of vertex v if 
and only if u is the first vertex after the first occurrence of v in the Euler tour such that 
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level(n) > I, provided that this vertex is a descendant of v. If it is not, v has no level-l 
descendant. 

By this observation, the level descendant query reduces to a Find-Greater problem, 
analogous to Find-Smaller and solved in the same way, plus a test of descendance. Thus, 
to add this functionality, we use the same arrays E, L and add a vector F maintaining 
the first occurrence of each vertex in the tour. We also need a search structure for 
"Find Greater." This structure is, of course, completely symmetric to the Find-Smaller 
structure so no further explanation should be necessary (incidentally, the micro table 
a-la Berkman-Vishkin can be shared). Testing for descendance is easy — u descends from 
V if and only if F[v\ < F[v^ < R[v]. 

The level successor query is handled similarly, by the following observation: 

Observation 8 Vertex u is the level successor of vertex v if and only if u is the first 
vertex after the last occurrence of v in the Euler tour such that level(u) > level(i;). 

7 Conclusion 

I described how to construct and query a data structure for answering Level Ancestor 
queries on trees. The algorithm is based on Berkman and Vishkin's Euler Tour technique 
and is, in essence, a simplification of their PRAM algorithm. In contrast to the original, 
this version of the algorithm is simple and practical. The algorithm was implemented 
in C by Victor Buchnik; the code can be obtained from Amir Ben-Amram. 

Another advantage of this algorithm is that it can be easily extended to support 
queries for Level Descendants and Level Successors. 
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